Detecting sarcasm and verbal irony from people's subjective statements is crucial to understanding their intended meanings and real sentiments and positions in social scenarios. This paper describes the X-PuDu system that participated in SemEval-2022 Task 6, iSarcasmEval - Intended Sarcasm Detection in English and Arabic, which aims at detecting intended sarcasm in various settings of natural language understanding. Our solution finetunes pre-trained language models, such as ERNIE-M and DeBERTa, under the multilingual settings to recognize the irony from Arabic and English texts. Our system ranked second out of 43, and ninth out of 32 in Task A: one-sentence detection in English and Arabic; fifth out of 22 in Task B: binary multi-label classification in English; first out of 16, and fifth out of 13 in Task C: sentence-pair detection in English and Arabic.
translated by 谷歌翻译
我们引入了基于高斯工艺回归和边缘化图内核(GPR-MGK)的探索性主动学习(AL)算法,以最低成本探索化学空间。使用高通量分子动力学模拟生成数据和图神经网络(GNN)以预测,我们为热力学性质预测构建了一个主动学习分子模拟框架。在特定的靶向251,728个烷烃分子中,由4至19个碳原子及其液体物理特性组成:密度,热能和汽化焓,我们使用AL算法选择最有用的分子来代表化学空间。计算和实验测试集的验证表明,只有313个(占总数的0.124 \%)分子足以训练用于计算测试集的$ \ rm r^2> 0.99 $的精确GNN模型和$ \ rm rm r^2>>实验测试集0.94 $。我们重点介绍了提出的AL算法的两个优点:与高通量数据生成和可靠的不确定性量化的兼容性。
translated by 谷歌翻译
多模式情感分析和抑郁估计是两个重要的研究主题,旨在使用多模式数据预测人类精神状态。先前的研究重点是制定有效的融合策略,以交换和整合不同模式的与思想有关的信息。一些基于MLP的技术最近在各种计算机视觉任务中取得了巨大的成功。受到这一点的启发,我们探索了本研究中具有混合视角的多模式方法。为此,我们介绍了完全基于MLP的多模式特征处理框架CubeMLP。 CUBEMLP由三个独立的MLP单元组成,每个单元都有两个仿射转换。 CUBEMLP接受所有相关的模态特征作为输入,并在三个轴上混合它们。使用CubeMLP提取特性后,将混合的多模式特征扁平以进行任务预测。我们的实验是在情感分析数据集上进行的:CMU-MOSI和CMU-MOSEI,以及抑郁估计数据集:AVEC2019。结果表明,CUBEMLP可以以低得多的计算成本来实现最先进的性能。
translated by 谷歌翻译
本文介绍了一种新的方法,为入境驾驶场景的自动车辆产生最佳轨迹。该方法使用两相优化过程计算轨迹。在第一阶段中,优化过程产生具有不同的曲率的闭形驾驶导向线。在第二阶段,该过程将驱动导向线作为输入输出,输出沿着导向线驾驶的车辆的动态可行,混蛋和时间最佳轨迹。该方法对于在弯曲道路上产生轨迹特别有用,其中车辆需要频繁加速和减速以适应离心机加速限制。
translated by 谷歌翻译
路径规划是自治车辆运动规划中的关键组成部分。路径指定车辆将旅行的几何形状,因此,对安全和舒适的车辆运动至关重要。对于城市驾驶场景,自治车辆需要能够在杂乱的环境中导航,例如,道路部分被侧面挡住的车辆/障碍物。如何生成运动学上可行和平滑的路径,可以避免复杂环境中的碰撞,使路径规划有挑战性的问题。在本文中,我们提出了一种新型二次编程方法,可以产生分辨率完全碰撞避免能力的最佳路径。
translated by 谷歌翻译
Federated learning allows edge devices to collaboratively learn a shared model while keeping the training data on device, decoupling the ability to do model training from the need to store the data in the cloud. We propose the Federated matched averaging (FedMA) algorithm designed for federated learning of modern neural network architectures e.g. convolutional neural networks (CNNs) and LSTMs. FedMA constructs the shared global model in a layer-wise manner by matching and averaging hidden elements (i.e. channels for convolution layers; hidden states for LSTM; neurons for fully connected layers) with similar feature extraction signatures. Our experiments indicate that FedMA not only outperforms popular state-of-the-art federated learning algorithms on deep CNN and LSTM architectures trained on real world datasets, but also reduces the overall communication burden. 1 * Work performed while doing an internship at IBM Research.
translated by 谷歌翻译
With increasing scale, large language models demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model scale without any external supervised data. To achieve this goal, we revisit masked language modeling and present a geometry-guided self-supervised learning method (Go-tuningfor short) by taking a small number of task-aware self-supervised data to update language models further. Experiments show that Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B). We also apply Go-tuning on multi-task settings and develop a multi-task model, mgo-T5 (250M). It can reach the average performance of OPT (175B) on 9 datasets.
translated by 谷歌翻译
Diffusion model, a new generative modelling paradigm, has achieved great success in image, audio, and video generation. However, considering the discrete categorical nature of text, it is not trivial to extend continuous diffusion models to natural language, and text diffusion models are less studied. Sequence-to-sequence text generation is one of the essential natural language processing topics. In this work, we apply diffusion models to approach sequence-to-sequence text generation, and explore whether the superiority generation performance of diffusion model can transfer to natural language domain. We propose SeqDiffuSeq, a text diffusion model for sequence-to-sequence generation. SeqDiffuSeq uses an encoder-decoder Transformers architecture to model denoising function. In order to improve generation quality, SeqDiffuSeq combines the self-conditioning technique and a newly proposed adaptive noise schedule technique. The adaptive noise schedule has the difficulty of denoising evenly distributed across time steps, and considers exclusive noise schedules for tokens at different positional order. Experiment results illustrate the good performance on sequence-to-sequence generation in terms of text quality and inference time.
translated by 谷歌翻译
In this paper, we introduce a novel optimization algorithm for machine learning model training called Normalized Stochastic Gradient Descent (NSGD) inspired by Normalized Least Mean Squares (NLMS) from adaptive filtering. When we train a high-complexity model on a large dataset, the learning rate is significantly important as a poor choice of optimizer parameters can lead to divergence. The algorithm updates the new set of network weights using the stochastic gradient but with $\ell_1$ and $\ell_2$-based normalizations on the learning rate parameter similar to the NLMS algorithm. Our main difference from the existing normalization methods is that we do not include the error term in the normalization process. We normalize the update term using the input vector to the neuron. Our experiments present that the model can be trained to a better accuracy level on different initial settings using our optimization algorithm. In this paper, we demonstrate the efficiency of our training algorithm using ResNet-20 and a toy neural network on different benchmark datasets with different initializations. The NSGD improves the accuracy of the ResNet-20 from 91.96\% to 92.20\% on the CIFAR-10 dataset.
translated by 谷歌翻译
Language models with the Transformers structure have shown great performance in natural language processing. However, there still poses problems when fine-tuning pre-trained language models on downstream tasks, such as over-fitting or representation collapse. In this work, we propose HyPe, a simple yet effective fine-tuning technique to alleviate such problems by perturbing hidden representations of Transformers layers. Unlike previous works that only add noise to inputs or parameters, we argue that the hidden representations of Transformers layers convey more diverse and meaningful language information. Therefore, making the Transformers layers more robust to hidden representation perturbations can further benefit the fine-tuning of PLMs en bloc. We conduct extensive experiments and analyses on GLUE and other natural language inference datasets. Results demonstrate that HyPe outperforms vanilla fine-tuning and enhances generalization of hidden representations from different layers. In addition, HyPe acquires negligible computational overheads, and is better than and compatible with previous state-of-the-art fine-tuning techniques.
translated by 谷歌翻译